Find keyword values from PDF [closed]

Posted by JukkaA on Server Fault See other posts from Server Fault or by JukkaA
Published on 2012-09-27T13:42:49Z Indexed on 2012/09/27 15:39 UTC
Read the original article Hit count: 251

Filed under:
|
|

I have a lot of PDF reports I'd need to index. They're mostly "text-based PDFs", not images. I know they all have account number in certain format, 123456AAAAA and some other keyword info like addresses, customer names etc. needed in indexing these files. Basically if the file is ab.pdf, I need to create ab.txt that contains:

ACC=123456AAAA
Customer=John Doe
Date=20120808

What would be the best software/solution to generate indexing information for these?

I know there's pdftotext, but piping it to different grep/awk commands is a hack... It would be nice to specify an area in PDF to search for the account number, and specify the format it is in.

© Server Fault or respective owner

Related posts about indexing

Related posts about pdf